INDEX
Explanations
sentences or phrases ending with 'report that'
repeated punctuation marks or periods
New Auto-Interp
Negative Logits
ãĥ´
-0.74
ighter
-0.71
infl
-0.70
å¼
-0.67
ãĥģ
-0.61
overs
-0.61
ktop
-0.60
ãĥ«
-0.58
mith
-0.57
GN
-0.57
POSITIVE LOGITS
shall
0.76
selves
0.68
."
0.66
TAG
0.66
hello
0.64
IAN
0.63
safe
0.60
Oops
0.60
stocks
0.60
respect
0.60
Activations Density 0.019%