INDEX
Explanations
action verbs related to human behavior
action verbs indicating influence or interaction
New Auto-Interp
Negative Logits
iscons
-0.70
issance
-0.65
destro
-0.64
encount
-0.61
agre
-0.59
COURT
-0.59
Instance
-0.59
enthusi
-0.58
afore
-0.58
APPLIC
-0.57
POSITIVE LOGITS
ings
1.01
ingly
0.92
eth
0.83
adelphia
0.81
themselves
0.76
fully
0.74
INGS
0.71
geries
0.70
ments
0.70
able
0.70
Activations Density 0.522%