INDEX
Explanations
occurrences where something goes wrong or there is a negative outcome
phrases that repeatedly use the word "only."
New Auto-Interp
Negative Logits
insula
-0.74
ahime
-0.74
idon
-0.70
lass
-0.68
align
-0.66
multipl
-0.66
hement
-0.65
ducers
-0.64
PK
-0.63
arthy
-0.61
POSITIVE LOGITS
marginally
0.99
kidding
0.94
incidentally
0.84
spor
0.81
seconds
0.80
minutes
0.79
surpassed
0.75
vaguely
0.74
days
0.72
briefly
0.72
Activations Density 0.066%