INDEX
Explanations
names of individuals
proper nouns and names
New Auto-Interp
Negative Logits
envy
-0.69
âĶĢâĶĢ
-0.60
plates
-0.57
ambassadors
-0.53
rivals
-0.50
ModLoader
-0.50
CONT
-0.50
uries
-0.50
cubes
-0.50
Curiosity
-0.50
POSITIVE LOGITS
kowski
0.96
zinski
0.93
inski
0.92
ovich
0.91
chuk
0.91
nick
0.89
Jr
0.88
ansky
0.87
owski
0.86
meier
0.84
Activations Density 0.333%