INDEX
Explanations
mentions of exclusivity or singularity
instances of the word "only."
New Auto-Interp
Negative Logits
anton
-0.68
dilig
-0.63
quez
-0.62
yout
-0.59
contend
-0.59
mares
-0.58
shore
-0.58
insula
-0.57
tree
-0.56
IDA
-0.56
POSITIVE LOGITS
only
3.04
ONLY
2.51
only
2.39
Only
2.09
Only
2.08
merely
1.55
sole
1.32
hardly
1.23
solely
1.19
neither
1.17
Activations Density 0.082%