INDEX
Explanations
instances of the word "talk" and its variations indicating communication
New Auto-Interp
Negative Logits
landfall
-0.78
anmar
-0.76
aples
-0.70
\/\/
-0.68
ritional
-0.67
handc
-0.67
arte
-0.67
unbeliev
-0.67
conflic
-0.66
PDATE
-0.64
POSITIVE LOGITS
ership
1.16
ers
0.93
tion
0.82
Talk
0.81
bone
0.80
edin
0.78
iba
0.77
hips
0.76
eth
0.74
about
0.74
Activations Density 0.008%