INDEX
Explanations
numerical data and mentions of historical dates
New Auto-Interp
Negative Logits
ANGER
-0.14
ãģĵãģĿ
-0.14
Χ
-0.14
rive
-0.14
çĴĥ
-0.13
stoi
-0.13
Worce
-0.13
regor
-0.13
OLOR
-0.13
ãģ¬
-0.13
POSITIVE LOGITS
194
0.20
195
0.20
196
0.19
fo
0.18
190
0.17
pr
0.16
197
0.16
193
0.16
189
0.16
191
0.15
Activations Density 0.077%