INDEX
Explanations
expressions related to emotional distress or suicidal thoughts
actions (verbs related to taking, giving, sending)
New Auto-Interp
Negative Logits
Datuak
-0.56
UrlResolution
-0.54
principalColumn
-0.52
للمعارف
-0.51
rungsseite
-0.49
onData
-0.47
клопе
-0.47
uVar
-0.46
genossen
-0.45
vettor
-0.45
POSITIVE LOGITS
り
0.75
取り
0.59
け
0.56
り
0.55
し
0.54
立ち
0.54
書き
0.53
を作り
0.53
き
0.50
ぎ
0.50
Activations Density 0.010%