INDEX
Explanations
references to textual content, such as text messages or written text
instances of the word "text."
New Auto-Interp
Negative Logits
pload
-0.71
vre
-0.70
CVE
-0.67
dL
-0.67
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.66
clamp
-0.64
Drone
-0.62
notor
-0.61
avement
-0.61
Parenthood
-0.60
POSITIVE LOGITS
text
1.15
texts
1.13
text
1.05
ured
1.04
books
0.96
Text
0.94
book
0.93
area
0.90
messages
0.88
urized
0.88
Activations Density 0.012%