INDEX
Explanations
terms and phrases related to violence and conflict resolution
New Auto-Interp
Negative Logits
Supporting
-0.20
Teaching
-0.17
erb
-0.16
Handling
-0.16
Watching
-0.16
Searching
-0.15
Handling
-0.15
Viewing
-0.15
Sending
-0.15
ivec
-0.15
POSITIVE LOGITS
becoming
0.37
developing
0.35
coming
0.34
going
0.31
getting
0.31
disappearing
0.30
falling
0.29
growing
0.28
turning
0.28
appearing
0.28
Activations Density 0.726%