INDEX
Explanations
words signaling contrast or opposition
the word "meanwhile" and its contexts indicating ongoing or simultaneous actions
New Auto-Interp
Negative Logits
uto
-0.66
straw
-0.66
Freddy
-0.63
isable
-0.62
Affordable
-0.61
lic
-0.61
"""
-0.61
parole
-0.60
Columb
-0.60
enders
-0.59
POSITIVE LOGITS
æ©Ł
0.90
ðĿ
0.78
ãĤ´ãĥ³
0.73
ctr
0.72
CLASSIFIED
0.72
eredith
0.72
åĤ
0.70
å¯
0.70
ô
0.68
ï¸ı
0.68
Activations Density 0.010%