INDEX
Explanations
references to the word "grass" at varying activations
references to grass
New Auto-Interp
Negative Logits
icts
-0.73
osexual
-0.69
umbn
-0.69
Somers
-0.66
Heller
-0.65
andem
-0.64
Siem
-0.64
ISM
-0.64
critically
-0.63
prefrontal
-0.62
POSITIVE LOGITS
roots
1.19
lands
1.17
ho
1.14
wood
1.02
flower
1.00
banks
0.95
nut
0.91
woods
0.90
airst
0.90
land
0.89
Activations Density 0.016%