INDEX
Explanations
mentions of the word "Avatar"
references to the video game "Atari" and related franchises
New Auto-Interp
Negative Logits
esters
-0.85
ortun
-0.81
giving
-0.70
mington
-0.69
stab
-0.67
uter
-0.67
Dodd
-0.67
ec
-0.66
found
-0.66
cker
-0.66
POSITIVE LOGITS
Korra
1.06
atar
1.03
atars
0.91
Roku
0.86
idon
0.86
Avatar
0.83
âĹ¼
0.81
DragonMagazine
0.78
inho
0.76
abal
0.76
Activations Density 0.029%