INDEX
Explanations
references to specific films and characters in media
New Auto-Interp
Negative Logits
owie
-0.15
áli
-0.15
@}
-0.14
ách
-0.14
Ú©ÙĦÛĮ
-0.14
ergus
-0.14
ä¹
-0.13
působ
-0.13
Claw
-0.13
lyn
-0.13
POSITIVE LOGITS
Avatar
0.33
Avatar
0.27
avatar
0.25
bending
0.24
Air
0.23
Roku
0.23
Nickel
0.23
Fire
0.22
avatar
0.21
Sok
0.20
Activations Density 0.001%