INDEX
Explanations
phrases related to triggering events or actions
phrases that indicate triggering or initiating events
New Auto-Interp
Negative Logits
umbing
-0.75
basketball
-0.65
uddy
-0.62
Weld
-0.61
%]
-0.60
ockey
-0.59
Obj
-0.58
hai
-0.57
entirety
-0.57
mol
-0.57
POSITIVE LOGITS
cffffcc
0.87
nings
0.74
alarm
0.73
bolt
0.66
alarms
0.64
ragon
0.63
pist
0.63
havoc
0.63
DEF
0.62
anger
0.62
Activations Density 0.022%