INDEX
Explanations
references to specific states or conditions
New Auto-Interp
Negative Logits
ersh
-0.16
ird
-0.16
IRD
-0.16
akin
-0.15
uran
-0.14
ssp
-0.14
odos
-0.14
ause
-0.14
ulk
-0.14
spit
-0.14
POSITIVE LOGITS
affairs
0.58
Affairs
0.45
affair
0.42
play
0.31
mind
0.31
flux
0.30
Flux
0.27
mind
0.27
emergency
0.26
Siege
0.26
Activations Density 0.021%