INDEX
Explanations
references to newspapers
the term "newspaper" or references to print media
New Auto-Interp
Negative Logits
\/
-0.75
aris
-0.72
\/\/
-0.71
thank
-0.71
alions
-0.69
antes
-0.66
BLE
-0.66
ihilation
-0.65
ographies
-0.65
cent
-0.64
POSITIVE LOGITS
intuition
0.75
hitch
0.70
bath
0.62
ALT
0.61
intu
0.59
overnight
0.58
wav
0.58
snowball
0.57
yogurt
0.57
cider
0.57
Activations Density 0.000%