INDEX
Explanations
phrases related to reliance or dependence on particular resources or systems
New Auto-Interp
Negative Logits
terness
-0.82
flies
-0.75
visor
-0.74
Cola
-0.73
ONSORED
-0.73
zon
-0.72
facing
-0.71
ahime
-0.71
FLAG
-0.70
numbered
-0.70
POSITIVE LOGITS
scraps
0.89
intuition
0.84
brute
0.83
imported
0.83
intermedi
0.81
imports
0.79
instinct
0.79
unreliable
0.78
gimm
0.77
donations
0.76
Activations Density 0.032%