INDEX
Explanations
locations or place names, specifically ones ending in "der" and "der"
references to entities or groups, particularly in a context of comparison or categorization
New Auto-Interp
Negative Logits
ured
-0.69
reprodu
-0.65
urious
-0.63
filib
-0.62
urers
-0.61
crim
-0.60
veter
-0.58
Mehran
-0.57
showc
-0.57
rosse
-0.55
POSITIVE LOGITS
theless
1.31
dash
1.25
mere
1.14
bolt
1.06
wise
0.99
lust
0.99
dale
0.95
mia
0.95
side
0.95
pool
0.95
Activations Density 0.086%