INDEX
Explanations
references to individual authors or researchers
New Auto-Interp
Negative Logits
Beſ
-0.76
Theſe
-0.75
Diſ
-0.74
ſeveral
-0.73
Reſ
-0.70
ſee
-0.70
Anſ
-0.68
reaſon
-0.68
―――――
-0.68
-0.65
POSITIVE LOGITS
J
2.07
J
1.74
j
1.48
j
1.00
L
0.88
M
0.86
Jj
0.85
تضيفلها
0.84
qJ
0.84
JJ
0.84
Activations Density 0.079%