INDEX
Explanations
verbs expressing attribution or categorization
references to actions or statements made about people and their opinions
New Auto-Interp
Negative Logits
maxwell
-0.64
Shock
-0.59
Revival
-0.57
reactive
-0.57
ceasefire
-0.57
Zeit
-0.57
DPR
-0.56
Axel
-0.56
sts
-0.55
resy
-0.55
POSITIVE LOGITS
course
0.81
dearly
0.78
abouts
0.73
deems
0.70
deem
0.68
unes
0.68
cherish
0.67
ãĥ¼ãĤ¯
0.65
ģĸ
0.64
968
0.64
Activations Density 0.443%