INDEX
Explanations
words related to behavioral observations and changes
New Auto-Interp
Negative Logits
Laurens
-0.84
Nip
-0.79
Milne
-0.78
fairest
-0.71
tigma
-0.70
Donne
-0.70
Tup
-0.69
Lons
-0.69
risten
-0.69
glomer
-0.69
POSITIVE LOGITS
behavior
1.80
behaviour
1.69
behaviors
1.60
Behavior
1.57
behavior
1.56
behaviours
1.51
BEHAVIOR
1.47
behaviour
1.44
Behavior
1.43
Behaviour
1.42
Activations Density 0.117%