INDEX
Explanations
phrases related to human behavior
mentions of behavior, particularly in a social or psychological context
New Auto-Interp
Negative Logits
endiary
-0.70
anmar
-0.69
gur
-0.67
enegger
-0.66
vette
-0.66
Lank
-0.66
ondo
-0.64
mand
-0.64
ixed
-0.64
inite
-0.64
POSITIVE LOGITS
behaviors
1.00
behavior
0.98
modification
0.98
behaviours
0.96
uation
0.93
behaviour
0.91
aviour
0.90
avior
0.90
uate
0.89
patterns
0.89
Activations Density 0.040%