INDEX
Explanations
locations or actions involving aggression or conflict
instances of physical interactions or confrontations
New Auto-Interp
Negative Logits
Included
-0.62
Laksh
-0.61
mine
-0.61
anish
-0.59
Quantity
-0.58
fried
-0.57
orno
-0.57
sonian
-0.57
FIG
-0.56
Owner
-0.56
POSITIVE LOGITS
whom
0.87
oba
0.71
armac
0.70
waivers
0.67
acebook
0.65
ventory
0.64
insky
0.64
stride
0.62
vl
0.62
subtle
0.60
Activations Density 0.580%