INDEX
Explanations
proper nouns or specific names mentioned in the text
New Auto-Interp
Negative Logits
Efq
-0.89
myſelf
-0.88
itſelf
-0.86
Italijanski
-0.86
houſe
-0.84
صوتيه
-0.83
Monfieur
-0.82
UnsafeEnabled
-0.81
PreferredItem
-0.81
doubtnut
-0.81
POSITIVE LOGITS
El
0.49
en
0.48
ID
0.47
H
0.47
’
0.46
al
0.46
Ra
0.45
i
0.45
ra
0.43
r
0.43
Activations Density 0.193%