INDEX
Explanations
noun followed by verb/preposition
New Auto-Interp
Negative Logits
:
1.15
the
1.02
The
0.72
ه
0.65
のカ
0.63
;
0.63
four
0.63
the
0.62
᱖
0.62
:“
0.61
POSITIVE LOGITS
大
0.59
I
0.58
ILL
0.58
Jahr
0.58
on
0.51
Bezirk
0.51
ON
0.49
听
0.49
Adoles
0.48
Bezir
0.48
Activations Density 1.303%