INDEX
Explanations
phrases related to environmental vulnerability
New Auto-Interp
Negative Logits
uably
-0.88
ãģ®éŃĶ
-0.77
arta
-0.76
notations
-0.74
verty
-0.73
ä
-0.70
miah
-0.69
ials
-0.69
pring
-0.69
Leary
-0.68
POSITIVE LOGITS
temptation
1.11
ridicule
0.95
attack
0.95
fend
0.94
criticism
0.94
extinction
0.89
manipulation
0.88
sabotage
0.88
overhe
0.88
defend
0.86
Activations Density 0.079%