INDEX
Explanations
expressing empathy and concern
New Auto-Interp
Negative Logits
عبادت
0.42
ំព
0.40
fired
0.40
Thrown
0.39
䨐
0.38
sending
0.37
stopping
0.37
োর্স
0.37
ಟರ್
0.37
Afrika
0.36
POSITIVE LOGITS
első
0.50
eerst
0.50
pertama
0.49
belonged
0.44
hearts
0.43
belongs
0.43
belong
0.42
främ
0.42
Bick
0.42
первую
0.41
Activations Density 0.003%