INDEX
Explanations
emotionally charged and evaluative words or phrases
descriptive phrases expressing strong opinions or emotions
New Auto-Interp
Negative Logits
rongh
-0.82
execute
-0.78
idden
-0.78
obook
-0.76
orsi
-0.74
govtrack
-0.73
artney
-0.73
Downloadha
-0.73
bara
-0.72
foreseen
-0.71
POSITIVE LOGITS
huh
1.21
eh
0.95
tho
0.88
coincidence
0.84
kidding
0.83
congr
0.81
downside
0.80
ya
0.72
!!
0.72
Kills
0.71
Activations Density 0.287%