INDEX
Explanations
negative or critical comments
phrases that express skepticism or doubt
New Auto-Interp
Negative Logits
ividual
-0.80
MAT
-0.67
alez
-0.66
mut
-0.64
hang
-0.63
eki
-0.63
emetery
-0.63
iom
-0.62
runs
-0.62
ulative
-0.62
POSITIVE LOGITS
percept
0.88
bother
0.76
noticeable
0.73
withstanding
0.73
bothered
0.71
ever
0.70
paralle
0.70
distinguish
0.70
insignificant
0.70
shy
0.69
Activations Density 0.009%