INDEX
Explanations
phrases related to confrontation or criticism
references to physical confrontations or reprimands in various contexts
New Auto-Interp
Negative Logits
GREEN
-0.64
arcity
-0.63
TABLE
-0.63
"},"
-0.59
Lafayette
-0.59
CRE
-0.59
izable
-0.59
encl
-0.58
Lower
-0.58
}:
-0.57
POSITIVE LOGITS
=-=-
0.68
viks
0.63
steroids
0.63
blender
0.62
shit
0.62
Ïī
0.62
cheek
0.61
undermin
0.61
pill
0.61
punches
0.60
Activations Density 0.533%