INDEX
Explanations
phrases related to actions and decisions made by individuals
expressions of personal feelings and motivations
New Auto-Interp
Negative Logits
arthed
-0.69
¶
-0.65
hub
-0.62
unison
-0.60
Browse
-0.58
Ranked
-0.58
moil
-0.57
§§
-0.56
heast
-0.55
ballots
-0.55
POSITIVE LOGITS
himself
1.37
Himself
1.08
remorse
0.77
)."
0.74
herself
0.74
his
0.72
subordinates
0.69
repentance
0.67
buddies
0.65
behavi
0.63
Activations Density 1.032%