INDEX
Explanations
specific descriptions of physical interactions, especially involving grabbing and physical features like hair or face
instances of aggressive actions or physical confrontations
New Auto-Interp
Negative Logits
eur
-0.50
Ital
-0.46
dracon
-0.46
inos
-0.45
negro
-0.44
Poles
-0.43
Ain
-0.43
obar
-0.43
Negro
-0.43
Pigs
-0.42
POSITIVE LOGITS
è¦ļéĨĴ
0.60
":-
0.56
ãĥķãĤ©
0.52
bryce
0.49
.","
0.49
icity
0.48
%:
0.48
é¾įå¥ij士
0.48
osion
0.47
ranging
0.46
Activations Density 0.308%