INDEX
Explanations
expressions of agreement and consensus
New Auto-Interp
Negative Logits
Efq
-1.17
Theſe
-1.17
Houſe
-1.12
itſelf
-1.11
Jefus
-1.09
Monfieur
-1.08
Anſ
-1.05
Reſ
-1.03
Beſ
-1.00
myſelf
-0.99
POSITIVE LOGITS
ו
0.78
0.69
agreed
0.68
(
0.65
R
0.62
agre
0.60
A
0.59
agree
0.59
AGRE
0.58
<eos>
0.58
Activations Density 0.163%