INDEX
Explanations
repeated phrases or actions, possibly in a negative context such as mistakes or frustrations
New Auto-Interp
Negative Logits
wake
-0.48
nostic
-0.47
urden
-0.46
olphins
-0.45
card
-0.45
anza
-0.44
sson
-0.44
rain
-0.43
iaz
-0.43
Crane
-0.43
POSITIVE LOGITS
rogen
0.63
rogens
0.63
then
0.57
forth
0.55
romeda
0.54
consequently
0.52
alus
0.52
heals
0.51
parcel
0.49
THEN
0.48
Activations Density 7.856%