INDEX
Explanations
verbs, especially those related to motion and action
terms related to legal or political concepts
New Auto-Interp
Negative Logits
Reloaded
-0.69
Daddy
-0.63
Luck
-0.63
ALLY
-0.61
prematurely
-0.61
WT
-0.60
Water
-0.60
Textures
-0.60
Flash
-0.58
Cold
-0.58
POSITIVE LOGITS
itud
1.00
ande
0.93
rique
0.86
itent
0.85
idas
0.84
ó
0.84
én
0.83
ét
0.81
ibli
0.80
ici
0.80
Activations Density 0.209%