INDEX
Explanations
specific names, terms, and punctuation that indicate engagement or interaction
New Auto-Interp
Negative Logits
ivent
-0.15
iffin
-0.15
Cop
-0.15
ForObject
-0.14
errer
-0.14
ies
-0.14
oulder
-0.14
cop
-0.14
cop
-0.14
ück
-0.14
POSITIVE LOGITS
Above
0.18
_above
0.18
above
0.18
ABOVE
0.18
above
0.17
onus
0.17
Above
0.16
Ìģc
0.15
енÑģ
0.14
GBT
0.14
Activations Density 0.024%