INDEX
Explanations
mentions of strength or intensity
Repeated instances of the word "strong" in various contexts
New Auto-Interp
Negative Logits
oleon
-0.75
unfocusedRange
-0.72
Newly
-0.72
regon
-0.71
Correction
-0.70
Sloan
-0.68
Wonderland
-0.66
ilege
-0.66
̶
-0.64
Hebdo
-0.63
POSITIVE LOGITS
believer
0.88
nesses
0.86
man
0.85
circumst
0.84
enough
0.84
arm
0.83
proponent
0.83
enough
0.81
deterrent
0.81
indication
0.81
Activations Density 0.061%