INDEX
Explanations
phrases related to categorization or comparison
phrases indicating membership or inclusion within a group or concept
New Auto-Interp
Negative Logits
ahime
-0.83
lyak
-0.74
Locations
-0.61
iland
-0.61
berry
-0.61
ischer
-0.61
ortment
-0.60
ugi
-0.60
BAT
-0.60
soType
-0.59
POSITIVE LOGITS
whatsoever
1.49
nor
1.33
anymore
1.00
dime
0.95
except
0.91
necessarily
0.85
bothered
0.83
slightest
0.83
EVER
0.80
anybody
0.79
Activations Density 0.131%