INDEX
Explanations
words related to vulnerabilities or weaknesses
mentions of weakness and vulnerabilities
New Auto-Interp
Negative Logits
rea
-0.93
agher
-0.87
ateur
-0.74
orate
-0.72
aston
-0.72
arel
-0.71
rolley
-0.68
cise
-0.67
alogue
-0.65
ahon
-0.65
POSITIVE LOGITS
weakness
1.09
weaknesses
0.92
aversion
0.85
Weak
0.83
nesses
0.82
undermin
0.82
limitation
0.79
Weak
0.78
weak
0.77
tolerance
0.76
Activations Density 0.008%