INDEX
Explanations
words related to vulnerabilities or weaknesses
references to weaknesses in various contexts
New Auto-Interp
Negative Logits
agher
-0.92
rea
-0.84
ateur
-0.74
aston
-0.71
ahon
-0.71
iser
-0.70
aughlin
-0.68
zos
-0.68
ilon
-0.68
alogue
-0.67
POSITIVE LOGITS
weakness
1.04
weaknesses
0.87
aversion
0.85
limitation
0.83
weaken
0.81
nesses
0.81
weakening
0.80
vulner
0.79
Weak
0.78
Weak
0.77
Activations Density 0.009%