INDEX
Explanations
intensifiers that emphasize degrees of certainty or strength, particularly the word "very"
New Auto-Interp
Negative Logits
joking
-0.66
Meeting
-0.65
oresc
-0.63
Trojan
-0.61
messenger
-0.61
Coming
-0.60
hanging
-0.60
theater
-0.59
ringing
-0.59
Cop
-0.57
POSITIVE LOGITS
rouse
0.98
berra
0.83
rued
0.82
rely
0.81
afford
0.78
likely
0.77
nels
0.76
fill
0.76
pled
0.75
ially
0.75
Activations Density 0.039%