INDEX
Explanations
words related to weakness or vulnerability
references to weakness or frailty
New Auto-Interp
Negative Logits
ICAN
-0.81
APH
-0.75
agher
-0.74
andise
-0.71
alogue
-0.71
Noir
-0.69
Sloan
-0.68
Andromeda
-0.68
McCann
-0.68
ICA
-0.67
POSITIVE LOGITS
nesses
1.27
lings
1.18
ling
0.99
ening
0.92
ens
0.91
ener
0.87
ened
0.86
est
0.85
les
0.84
minded
0.82
Activations Density 0.013%