INDEX
Explanations
the word "switch" with a high activation value
instances of the word "switch."
New Auto-Interp
Negative Logits
za
-0.81
apolis
-0.72
ORED
-0.65
kamp
-0.65
nutrition
-0.64
vez
-0.63
fare
-0.62
USD
-0.62
our
-0.62
zza
-0.61
POSITIVE LOGITS
blade
1.02
grass
1.01
switch
0.89
aroo
0.89
gear
0.85
switch
0.84
backs
0.83
switches
0.83
gears
0.76
uese
0.73
Activations Density 0.019%