INDEX
Explanations
mentions of specific individuals or entities
determiners and descriptors related to people, roles, or entities in various contexts
New Auto-Interp
Negative Logits
rison
-0.76
isks
-0.68
midt
-0.68
chuk
-0.66
ourses
-0.66
nesday
-0.66
ants
-0.63
antle
-0.62
Cow
-0.62
arks
-0.61
POSITIVE LOGITS
pport
0.79
antit
0.78
TBD
0.72
Rated
0.72
prone
0.69
habitable
0.69
compatible
0.69
dstg
0.69
pired
0.68
quartered
0.67
Activations Density 0.365%