INDEX
Explanations
words indicating movement or spatial positions
away, up, fire, ground, working, invented
New Auto-Interp
Negative Logits
تضيفلها
-0.66
-0.64
Atsauces
-0.60
kasarigan
-0.59
typelib
-0.58
voran
-0.58
něž
-0.57
RTEX
-0.57
burgeoning
-0.56
modelBuilder
-0.56
POSITIVE LOGITS
airplanes
0.45
FUCK
0.45
everybody
0.44
Anſ
0.43
stuff
0.43
airplane
0.42
spion
0.41
Inſ
0.41
STUFF
0.40
Everybody
0.40
Activations Density 0.067%