INDEX
Explanations
phrases related to political news or public figures
phrases that include significant punctuation marks or separators
New Auto-Interp
Negative Logits
¬¼
-0.71
UF
-0.68
minster
-0.66
ocl
-0.60
acent
-0.59
aha
-0.59
ensation
-0.59
iple
-0.59
onomous
-0.58
acan
-0.58
POSITIVE LOGITS
huh
1.10
etc
1.00
albeit
0.95
lest
0.91
eh
0.90
aka
0.88
namely
0.83
haha
0.82
but
0.79
whereas
0.79
Activations Density 0.645%