INDEX
Explanations
words related to behavior or actions
variations of the word "behave" in different contexts
New Auto-Interp
Negative Logits
fram
-0.77
andel
-0.71
lake
-0.68
pelling
-0.67
Solo
-0.66
fer
-0.65
landing
-0.65
ondo
-0.64
Herz
-0.64
export
-0.63
POSITIVE LOGITS
uate
1.01
iments
0.88
err
0.86
behavi
0.85
behaves
0.85
uated
0.84
uations
0.84
differently
0.82
ativity
0.81
behave
0.81
Activations Density 0.029%