INDEX
Explanations
statements expressing knowledge, agreement, and opinion
New Auto-Interp
Negative Logits
tartalomajánló
-1.27
Efq
-1.14
Majefty
-1.00
pleaſure
-0.99
houſe
-0.99
Houſe
-0.98
greateſt
-0.97
himſelf
-0.97
Theſe
-0.95
Reſ
-0.94
POSITIVE LOGITS
it
0.60
sure
0.60
think
0.55
гим
0.52
u
0.52
honestly
0.52
the
0.52
уж
0.51
if
0.49
tror
0.49
Activations Density 0.241%