INDEX
Explanations
phrases related to behavior or conduct
references to styles or methods of behavior
New Auto-Interp
Negative Logits
rament
-0.78
hemat
-0.71
TOP
-0.69
Barton
-0.67
tek
-0.66
Neighborhood
-0.66
Wink
-0.63
STATS
-0.63
Dust
-0.62
Lyn
-0.61
POSITIVE LOGITS
isms
1.14
othy
0.84
ality
0.80
dictated
0.79
ACTIONS
0.74
abus
0.73
ism
0.69
able
0.68
istic
0.67
consistent
0.66
Activations Density 0.017%