INDEX
Explanations
descriptions of actions and events involving people in various locations
New Auto-Interp
Negative Logits
ema
-0.69
phies
-0.66
pora
-0.66
redo
-0.65
ulnerability
-0.63
ogene
-0.62
nea
-0.62
ooting
-0.61
cknowled
-0.60
cia
-0.60
POSITIVE LOGITS
frantically
0.69
Sov
0.68
Ern
0.67
eyed
0.65
furiously
0.64
exha
0.63
dangerously
0.63
itored
0.61
RAG
0.61
redients
0.60
Activations Density 0.223%