INDEX
Explanations
phrases and structures indicating measurement or quantification
New Auto-Interp
Negative Logits
horn
-0.17
anc
-0.16
elia
-0.16
estr
-0.16
again
-0.16
eses
-0.16
intern
-0.15
orig
-0.15
again
-0.15
erm
-0.15
POSITIVE LOGITS
oling
0.17
sesso
0.16
RIPT
0.16
okies
0.15
inski
0.15
amarin
0.14
-horizontal
0.14
isposable
0.13
ityEngine
0.13
quel
0.13
Activations Density 0.026%