INDEX
Explanations
instances of emphatic punctuation or statements
New Auto-Interp
Negative Logits
iras
-0.15
uy
-0.14
undef
-0.14
inho
-0.14
akk
-0.14
Dabei
-0.14
rst
-0.14
ÑĢеÑĪ
-0.14
aris
-0.14
untas
-0.14
POSITIVE LOGITS
nor
0.54
rather
0.53
Rather
0.51
Nor
0.47
Rather
0.47
rather
0.46
Nor
0.43
nor
0.40
sondern
0.38
بÙĦÚ©Ùĩ
0.36
Activations Density 0.135%