INDEX
Explanations
judgmental expressions indicating disapproval or criticism
instances of the word "hardly" indicating minimal impact or significance
New Auto-Interp
Negative Logits
ividual
-0.83
spin
-0.69
emy
-0.68
hang
-0.65
MAT
-0.65
runs
-0.64
iem
-0.64
alez
-0.64
yne
-0.63
eki
-0.61
POSITIVE LOGITS
percept
0.84
bother
0.78
noticeable
0.73
paralle
0.73
bothered
0.72
conceivable
0.72
distinguish
0.71
withstanding
0.71
shy
0.69
ever
0.69
Activations Density 0.018%