INDEX
Explanations
absence of content or significant activation within text segments
New Auto-Interp
Negative Logits
er
-0.98
en
-0.87
ster
-0.74
g
-0.73
ly
-0.71
gen
-0.70
mon
-0.70
k
-0.69
le
-0.69
r
-0.69
POSITIVE LOGITS
་་
1.08
CURIAM
1.04
Мексичка
1.00
MFG
0.95
iented
0.94
hematical
0.94
">😂
0.94
betweenstory
0.93
preſent
0.92
HTT
0.92
Activations Density 0.033%