INDEX
Explanations
various forms of the verb "to rate" and related terms
New Auto-Interp
Negative Logits
sters
-0.80
aldi
-0.74
thing
-0.72
fully
-0.71
older
-0.70
adra
-0.69
enberg
-0.68
while
-0.68
stru
-0.68
ned
-0.68
POSITIVE LOGITS
eals
0.75
aily
0.72
BY
0.72
irect
0.70
ombat
0.69
version
0.68
ict
0.66
escription
0.65
urally
0.65
medicine
0.64
Activations Density 0.084%