INDEX
Explanations
sentences ending with a colon followed by a statement
statements or quotes made by individuals
New Auto-Interp
Negative Logits
trouble
-0.69
troubles
-0.68
giveaway
-0.64
delinquent
-0.63
overl
-0.61
controversy
-0.60
swaps
-0.59
shif
-0.58
stabilized
-0.58
popularity
-0.57
POSITIVE LOGITS
"â̦
1.27
"...
1.18
"[
1.15
"'
1.09
'[
1.08
""
1.02
"(
0.98
"@
0.97
".
0.96
"#
0.86
Activations Density 0.092%