INDEX
Explanations
words related to loss or absence
gone or visible
New Auto-Interp
Negative Logits
للمعارف
-0.45
bership
-0.40
chard
-0.39
chartInstance
-0.39
ckså
-0.38
wendigkeit
-0.37
\">\
-0.37
Abby
-0.36
ddelweddau
-0.35
Guard
-0.35
POSITIVE LOGITS
GONE
1.55
Gone
1.44
Gone
1.31
gone
1.11
VISIBLE
1.09
GONE
1.05
gone
0.69
visible
0.53
visible
0.52
rungsseite
0.52
Activations Density 0.001%