INDEX
Explanations
words related to human behavior
references to behavioral patterns or conduct
New Auto-Interp
Negative Logits
ixed
-0.73
avez
-0.72
kos
-0.72
shall
-0.70
chin
-0.70
source
-0.69
plus
-0.69
bring
-0.68
occup
-0.68
forth
-0.67
POSITIVE LOGITS
behavior
1.35
behaviors
1.22
behavi
1.19
Behavior
1.12
behaviour
1.11
avior
1.10
behaviours
1.08
behavior
0.99
modification
0.97
behav
0.85
Activations Density 0.012%