INDEX
Explanations
phrases related to physical attributes or actions
references to physical conditions or physical experiences
New Auto-Interp
Negative Logits
ffic
-0.68
ered
-0.68
Tul
-0.68
ilts
-0.67
andel
-0.66
lr
-0.65
stakes
-0.65
Aval
-0.65
Gos
-0.65
Chance
-0.64
POSITIVE LOGITS
anguage
0.82
assaulted
0.80
cumbers
0.80
assaulting
0.76
illiter
0.74
speaking
0.73
handic
0.72
altercation
0.69
TPPStreamerBot
0.69
conclud
0.68
Activations Density 0.005%